沟通效率在加速深神经网络(DNN)的分布式训练中起着重要作用。 All-Reduce是减少分布式DNN培训中模型参数的关键沟通原始性。大多数现有的全减少算法都是为传统的电气互连系统设计的,该系统无法满足大型DNN分布式培训的通信要求。电气互连的有希望的替代方案之一是光学互连,可以提供高带宽,低传输延迟和低功率成本。我们提出了一个称为WRHT(波长重复使用的层次树)的有效方案,用于在光学互连系统中实现全降压操作,该系统可以利用WDM(波长多路复用)来减少分布式数据 - 偏置DNN训练的通信时间。我们进一步得出了最少的通信步骤和通信时间,以实现使用WRHT的全面减少。仿真结果表明,与在光学互连系统中模拟的三种传统的全减少算法相比,WRHT的通信时间分别减少了75.59%,49.25%和70.1%。仿真结果还表明,与电气互连系统中的两种现有的全减速算法相比,WRHT可以将所有还原操作的通信时间减少86.69%和84.71%。
translated by 谷歌翻译
这项工作旨在将在一个图像域上预先训练的生成的对抗网络(GaN)转移到新域名,其仅仅是只有一个目标图像。主要挑战是,在有限的监督下,综合照片现实和高度多样化的图像非常困难,同时获取目标的代表性。不同于采用Vanilla微调策略的现有方法,我们分别将两个轻量级模块导入发电机和鉴别器。具体地,我们将属性适配器引入发电机中冻结其原始参数,通过该参数,它可以通过其重复利用现有知识,因此保持合成质量和多样性。然后,我们用一个属性分类器装备了学习良好的鉴别器骨干,以确保生成器从引用中捕获相应的字符。此外,考虑到培训数据的多样性差(即,只有一个图像),我们建议在培训过程中建议在生成域中的多样性限制,减轻优化难度。我们的方法在各种环境下提出了吸引力的结果,基本上超越了最先进的替代方案,特别是在合成多样性方面。明显的是,我们的方法即使具有大域间隙,并且在几分钟内为每个实验提供鲁棒地收敛。
translated by 谷歌翻译
SARS-COV-2是一种上呼吸系统的RNA病毒,截至2021年5月,在全球范围内引起超过300万人死亡,截至5月201日。迄今为止,SARS-COV-2突变对科学家造成重大挑战跟上疫苗开发和公共卫生措施的步伐。因此,鉴定来自患者的实验室样本分歧的有效方法非常有助于SARS-COV-2基因组学的文件。在这项研究中,我们提出了一种神经网络模型,可利用复发性和卷积单元直接参与尖峰蛋白的氨基酸序列并分类相应的片状。我们还将我们的模型的性能与来自蛋白质数据库预先培训的变压器(BERT)的双向编码器表示。我们的方法具有基于基于物种内分化的当前同源性提供了更加计算上的替代方案。
translated by 谷歌翻译
This technical report presents GPS++, the first-place solution to the Open Graph Benchmark Large-Scale Challenge (OGB-LSC 2022) for the PCQM4Mv2 molecular property prediction task. Our approach implements several key principles from the prior literature. At its core our GPS++ method is a hybrid MPNN/Transformer model that incorporates 3D atom positions and an auxiliary denoising task. The effectiveness of GPS++ is demonstrated by achieving 0.0719 mean absolute error on the independent test-challenge PCQM4Mv2 split. Thanks to Graphcore IPU acceleration, GPS++ scales to deep architectures (16 layers), training at 3 minutes per epoch, and large ensemble (112 models), completing the final predictions in 1 hour 32 minutes, well under the 4 hour inference budget allocated. Our implementation is publicly available at: https://github.com/graphcore/ogb-lsc-pcqm4mv2.
translated by 谷歌翻译
零射击行动识别(ZSAR)旨在识别培训期间从未见过的视频动作。大多数现有方法都假设看到和看不见的动作之间存在共享的语义空间,并打算直接学习从视觉空间到语义空间的映射。视觉空间和语义空间之间的语义差距挑战了这种方法。本文提出了一种新颖的方法,该方法使用对象语义作为特权信息来缩小语义差距,从而有效地帮助学习。特别是,提出了一个简单的幻觉网络,以在不明确提取对象的情况下隐式提取对象语义,并开发了一个交叉注意模块,以增强对象语义的视觉功能。奥林匹克运动,HMDB51和UCF101数据集的实验表明,所提出的方法的表现优于最先进的方法。
translated by 谷歌翻译
Conventional sensor-based localization relies on high-precision maps, which are generally built using specialized mapping techniques involving high labor and computational costs. In the architectural, engineering and construction industry, Building Information Models (BIM) are available and can provide informative descriptions of environments. This paper explores an effective way to localize a mobile 3D LiDAR sensor on BIM-generated maps considering both geometric and semantic properties. First, original BIM elements are converted to semantically augmented point cloud maps using categories and locations. After that, a coarse-to-fine semantic localization is performed to align laser points to the map based on iterative closest point registration. The experimental results show that the semantic localization can track the pose successfully with only one LiDAR sensor, thus demonstrating the feasibility of the proposed mapping-free localization framework. The results also show that using semantic information can help reduce localization errors on BIM-generated maps.
translated by 谷歌翻译
在线算法是算法设计中的重要分支。设计具有有界竞争比率的在线算法(在最坏情况性能方面)可能是艰难的并且通常依赖于特定于问题的假设。由生成对抗净净净(GAN)的对抗训练的启发和在线算法的竞争比率基于最坏情况的输入,我们采用深度神经网络来学习从头开始进行资源分配和定价问题的在线算法对于最坏情况的输入,可以最小化离线最佳和学习的在线算法之间的性能差距的目标。具体而言,我们分别利用两个神经网络作为算法和对手,让他们播放零和游戏,而对验证负责产生最坏情况的输入,而算法基于对手提供的输入学习最佳策略。为了确保算法网络的更好收敛(到所需的在线算法),我们提出了一种新颖的每轮更新方法来处理顺序决策,以便在不同的回合中断复杂依赖性,以便可以为每种可能的动作完成更新,而不是只有采样的行动。据我们所知,我们的作品是首次使用深度神经网络来设计一个在最坏情况性能保证的角度的在线算法。实证研究表明,我们的更新方法确保了纳什均衡的融合,并且学习算法在各种设置下优于最先进的在线算法。
translated by 谷歌翻译
近年来,较大和更深层次的模型正在涌现,并且不断推动最先进的(SOTA)在自然语言处理(NLP)和计算机视觉(CV)等各种领域之间的结果。然而,尽管结果具有很有希望的结果,但需要注意的是,SOTA模型所需的计算以指数率增加。大规模计算不仅具有令人惊讶的大型碳足迹,而且还对现实世界应用的研究包容性和部署也产生了负面影响。绿色深度学习是一项越来越热的研究领域,吸引研究人员,在模型培训和推理期间要注意能源使用和碳排放。目标是通过轻量级和高效的技术产生新的结果。许多技术可用于实现这一目标,如模型压缩和知识蒸馏。本文侧重于提出对绿色深度学习技术的发展的系统审查。我们将这些方法分为四类:(1)紧凑型网络,(2)节能培训策略,(3)节能推理方法,(4)高效数据使用。对于每个类别,我们讨论了实现的进展和未解决的挑战。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译